Modeling Co-articulation in Text-to-Audio Visual Speech

نویسندگان

Ashish Kapoor

Udit Kumar Goyal

Prem Kalra

چکیده

This paper provides our approach to co-articulation for a text-to-audiovisual speech synthesizer (TTAVS), a system for converting the input text to video realistic audio-visual sequence. It is an image-based system modeling the face using a set of images of a human subject. A concatenation of visemes –the corresponding lip shapes for phonemes— can be used for modeling visual speech. However, in actual speech production, there is overlap in the production of syllables and phonemes that are a sequence of discrete units of speech. Due to this overlap, vocal tract motions associated with producing one phonetic segment overlap the motions for producing surrounding segments. This overlap is called co-articulation. The lack of parameterization in the image-based model makes it difficult to use the techniques employed in 3D models for co-articulation. We introduce a method using polymorphing to incorporate co-articulation in our TTAVS. Further, we add temporal smoothing for viseme transitions to avoid jerky animation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VTalk: A System for generating Text-to-Audio-Visual Speech

This paper describes VTalk, a system for synthesizing text-to-audiovisual speech (TTAVS), where the input text is converted into an audiovisual speech stream incorporating the head and eye movements. It is an image-based system, where the face is modeled using a set of images of a human subject. A concatination of visemes –the corresponding lip shapes for phonemes— can be used for modeling visu...

متن کامل

Innovations in Czech audio-visual speech synthesis for precise articulation

This paper presents new steps toward animation of precise articulation. The acquisition of audio-visual corpus for Czech and new method for parameterization of visual speech was designed to obtain exact speech data. The parameterization method is primarily suitable for training a data driven visual speech synthesis systems. The audio-visual corpus includes also specially designed test part. Fur...

متن کامل

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

Talking heads - communication, articulation and animation

Human speech communication relies not only on audition, but also on vision, especially during poor acoustic conditions. The face is an important carrier of both linguistic and extra-linguistic information. Using computer graphics it is possible to synthesize faces and do audio-visual text-to-speech synthesis, a technique that has a number of interesting applications for example in the area of m...

متن کامل

Audio-Visual Speech Recognition for a Person with Severe Hearing Loss Using Deep Canonical Correlation Analysis

Recently, we proposed an audio-visual speech recognition system based on a neural network for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from that of people without hearing loss, making a speaker-independent acoustic model for unimpaired persons more or less usele...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Modeling Co-articulation in Text-to-Audio Visual Speech

نویسندگان

چکیده

منابع مشابه

VTalk: A System for generating Text-to-Audio-Visual Speech

Innovations in Czech audio-visual speech synthesis for precise articulation

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Talking heads - communication, articulation and animation

Audio-Visual Speech Recognition for a Person with Severe Hearing Loss Using Deep Canonical Correlation Analysis

عنوان ژورنال:

اشتراک گذاری